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Neuroimaging signal intensity measures underlying pliysiology at each voxel unit. Tine brain- 
wide distribution of signal intensities may be used to assess gross brain abnormality. To 
compare distributions of brain image data between groups, f-tests are widely applied. This 
approach, however, only compares group means and fails to consider the shapes of the 
distributions. We propose a simple approach for estimating both subject- and group-level 
density functions based on the framework of Gaussian mixture modeling, with mixture 
probabilities that are testable between groups. We demonstrate this approach by applica- 
tion to the analysis of fractional anisotropy image data for assessment of aging effects in 
white matter. 

Keywords: Gaussian mixture model, density function estimation, aging effects, fractional anisotropy, diffusion 
tensor imaging 



INTRODUCTION 

Fractional anisotropy (FA), a scalar measure derived from diffu- 
sion tensor imaging (DTI), indexes the degree of anisotropy of 
water diffusion in brain tissue. In normal white matter (WM), 
water diffusion is highly constrained to predominantly move par- 
allel to the long axis of axon bundles, or nerve fibers. High FA 
is thus characteristic of normal WM and may be an indicator of 
its health. Low FA, on the other hand, can reflect loss of WM 
microstructural elements in tissues, which normally have high FA 
and may be an indicator of disease. Low WM FA is associated with 
demyelinating disease, dementia, traumatic brain injury (TBI), 
and normal aging. 

Inter-subject variability of WM microstructure has been 
reported with normal subjects as well as patients with various 
neurodegenerative diseases ( 1-6). Neurodegeneration is also a fea- 
ture of normal aging and WM effects of aging disrupt cerebral 
connectivity, leading to cognitive dysfunction (7, 8). Commonly 
applied statistical analyses, e.g., f-tests at each voxel, may be inher- 
ently insensitive to disease pathology due to inter-subject spatial 
variation (2-4, 9). A whole brain (or whole WM) histogram 
approach has been used in some studies to address this limita- 
tion (10, II). For example, Benson et al. (10) estimated kurtosis, 
skewness, peak height, and mean from histograms of WM FA 
in TBI patients and normals. They aggregated these measures to 
test for group differences in the shapes of the individuals' his- 
tograms. This analysis demonstrated that the FA distribution of 



TBI patients exhibited higher kurtosis, higher peak height, and 
greater skew toward higher FA, but lower mean, compared to 
those derived from controls. Although this approach used standard 
statistical summary measures, differences among the shapes of dis- 
tributions between groups are not easily understood using these 
summary measures. In addition, these summary measures are not 
proper for description of multimodal distributions, a special case, 
which could result from a mixture of multiple distributions (12). 
Therefore, we propose that estimation and comparison of den- 
sity functions between groups based on a mixture distribution 
approach will be more relevant to brain imaging data, which can 
exhibit unusual distributions. 

In this study, we propose a simple approach to estimate subject- 
level density functions. The technique is based on a Gaussian 
mixture model (GMM), which assigns subject-specific mixing 
probabilities to latent underlying Gaussian densities a posteriori 
to characterize an overall distribution of each subject. Estimation 
of group-level density functions will be based on the estimated 
subject-level density functions. Differences between groups there- 
fore only depend on the composition of mixture probabilities to 
underlying Gaussian densities, which lead to an easy and intu- 
itive comparison between groups. For instance, in a simple GMM 
assuming two Gaussian density components with equal variance 
constraint, a mixture density function with higher mixing proba- 
bility to the Gaussian components with lower mean is character- 
ized as a distribution with lower mean and positive skew, while 
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the opposite is characterized as a distribution with higher mean 
and negative skew. Additionally, the mixture density function with 
mixing probabilities (1/2, 1/2) is a symmetric distribution with 
the highest variance in the data. We apply the proposed method 
to normal control subjects, to examine the effects of aging on the 
brain-wide distribution of FA. 

GENERAL GMM 

A general form of a GMM with m components can be expressed 
as follows: 



/(zy;e) = ^t(fc)(^t(zy). 



(1) 



k=l 



where Zy is FA measurement of the ;'-th voxel {j= 1, . . ., N) 
observed from the !-th subject (;'= 1, . . ., n). The density func- 
tion is assumed to be a convex combination of m latent 

Gaussian densities {^kiZif} = ^ { ^'' a^'" ^ ,^ = 1, • • • ,ni^ with 
corresponding m mixing probabilities [x(k),k=l, . . ., m], where 
(()(.) is the standard normal density function, and 9 = 0j-, and 
Xj^,k=l,.. ., m). The likelihood function based on model (1) is 
expressed as follows: 



L(z;e)=nn 



,=1;=1 



.k=l 



We assume that the variances are the same across the m- 
Gaussian densities, i.e., ai = . . .a^ = o, which reduces (/c= 1, 
. . ., m) to ^i^(Zij) = (j) (^-^^-^^ in Eq. 1. The m latent Gaussian 
densities then differ only by their centers. We order the centers 
of the m-Gaussian densities as [Xi < . . . < [im. This parameteriza- 
tion gives easy interpretation of results for comparison of density 
functions across subgroups. For example, a density function with 
higher mixing probabilities for low order Gaussian densities will 
have a smaller mean than that with higher mixing probabilities for 
higher order Gaussian densities; a density with mixing probability 
of 1/2 to each of the two Gaussian densities (lowest, highest) will 
have the largest variance than any other combination of mbcture 
probabilities. 

To estimate the parameters 0 of the general GMM with equal 
variance constraint, we applied an expectation maximization (EM) 
algorithm (13); of note, a Bayesian mixture modeling approach 
(14, 15) is an alternative approach. Specifically, the EM algorithm 
treats the mixture model in Eq. (1) as an incomplete likelihood 
function with missing membership information for each obser- 
vation. In each iteration of the EM algorithm, expectation-step 
(E-step) computes expectation of complete specification of log- 
likelihood function with respect to membership values with given 
data and estimated parameters, and maximization-step (M-step) 
computes maximum likelihood estimates of parameters speci- 
fied in the likelihood function with the expected membership 
values (16). The applied EM algorithm is provided in Table 1. 
The total number of parameters with equal variance condition 
for m-Gaussian densities is 2m [=m (means) -|-m— 1 (mbdng 
probabilities) -|- 1 (variance)]. To determine the optimal number 



Table 1 | EM algorithm adopted for this study. 

Probability density function 
E-step 



Hz) = E ^k(z) 
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M-step 
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A variable noted with (t) is the estimated value for the variable at the t-th iteration. 

of the latent Gaussian densities, c|)t-'s ik=l, . . ., m), we used 
the Akaike information criterion (AIC), which is expressed as 

n N I m \ 

2d -2 log \ikiz\e) = 2d-2j2 E log E t(^:)(^ic(z) .Sothat 

, = 1; = 1 \k=i I 

the optimal number of Gaussian densities (m) is associated with a 
minimum AIC value. 

ESTIMATION OF SUBJECT- AND GROUP-LEVEL DENSITY FUNCTIONS 

In this section, we propose a method to estimate subject- and 
group-level density functions and their associated mixing proba- 
bilities. This approach fits the general GMM of Eq. 1 to the fuU 
dataset (Z,j, i=l, . . ., n; j =1, . . ., N), all voxels from all sub- 
jects, and estimates subject- and group-level density functions a 
posteriori. 

Under the general GMM, the membership probability of z,j 
to A:-th Gaussian density, denoted by ty(/c), is obtained based on 
Bayes theorem as follows: 



m ' 
E <n h (Zy) 

1=1 



(2) 



which results in E '^iji^) = 1- 

Jc=l 

The estimated subject-level density function fi for the !-th 
subject can be expressed as follows: 



f,iz,j; 6) = ^ x,(k) (t)i (zy), 



(3) 



it=i 



with (^kizij) {k=l, . . ., m) estimated from the general GMM Eq. 
(1) using all voxel data from all subjects. The parameter Xi{k) in 
Eq. 3 is the mixing probability of the fc-th Gaussian density for 
the !-th subject, which we estimate as an average of membership 
probabilities of Z,j [j = I, N) as follows: 



T,(fc) 



1 

N 



Y.mk), 



Frontiers in Public IHeaith | Epidemiology 



April 2014 I Volume 2 | Article 32 | 2 



Kim et al. 



A GMM for estimating subgroup density function 



where Xij{k) was estimated based on Eq. 2. Similarly, the esti- 
mated group-level density function fg* for the ^-th group can be 
expressed as follows: 

m 

fg* (-Zy) = J2 V *t (z,j), (4) 
k=l 

where Xg* (k) in Eq. 4 is the mixing probability of the fc-th Gauss- 
ian density for the g-th group, which we estimate as an average of 
Ti{k) of the subjects in the g-th group as follows: 

« isGg ^ isGg ]=1 

where Ug is the number of subjects in the ^-th group. 

EXAMPLE 
SUBJECTS 

The Albert Einstein College of Medicine Institutional Review 
Board (IRB) approved and monitored this study. Twenty-eight 
normal subjects without history of head injury, cardiovascular 
or cerebrovascular disease, diabetes, or neurological or psychiatric 
disease were recruited between August 2006 and May 20 10 through 
advertisements. Demographic data for the 28 normal subjects are 
summarized in Table 2. 

DIFFUSION TENSOR IMAGE ACQUISITION AND IMAGE DATA 
PREPROCESSING 

Imaging was performed using a 3.0 T MRI scanner (Achieva; 
Philips Medical Systems, Best, The Netherlands) with an eight- 
channel head coil (Sense Head Coil; Philips Medical Systems). 
Tl -weighted whole-head structural imaging was performed using 
sagittal three-dimensional magnetization-prepared rapid acqui- 
sition gradient echo (MP-RAGE; TR/TE = 9.9/4.6 ms; field of 
view, 240 mm^; matrix, 240 x 240; and section thickness, 1 mm). 
T2-weighted whole-head imaging was performed using axial 
two-dimensional turbo spin-echo (TR/TE = 4000/100 ms; field 
of view, 240mm^; matrix, 384 x 512; and section thickness, 
4.5 mm). DTI was performed using single-shot echo-planar 
imaging (TR/TE = 3800/88 ms; field of view, 240 mm^; matrix, 
112 X 89; section thickness, 4.5 mm; independent diffusion sen- 
sitizing directions, 32; and fo = 800s/mm^). The images were 
preprocessed as described previously (4, 9). 

DEMONSTRATION OF THE PROPOSED METHODS USING EXAMPLE FA 
IMAGE DATA 

Fractional anisotropy datasets were normalized prior to fitting 
the proposed models. The normalization was necessary because it 
is well-known that FA is heterogeneous across brain regions; for 
example, higher FA is characteristic of deep WM such as the corpus 
callosum, and lower FA is typical of peripheral WM. Specifically, 
we normalized each FA measurement by Zy = ^^^^ with mean 

(xj) and standard deviation (sj) of n subjects (z = 1, . . ., n) at each 
voxel (j= 1, . . .,N). 

Subjects were divided into four age groups: 20-29, 30-39, 40- 
49, and 50-59 years. Each age group consisted of seven subjects 



Table 2 | Subjects' demographic characteristics. 

Gender Number Mean years Minimum Maximum 

of of education years of years of 

subjects (SD) education education 



All ag6s 


All gsndsrs 


28 


13.6 (1.6) 


12 


18 




Female 


16 


13.9 (1.8) 


12 


18 




Male 


12 


13.2 (1.3) 


12 


16 


20-29 


All 


7 


14.1 (1.7) 


12 


16 




Female 


4 


14.3 (1.7) 


12 


16 




Male 


3 




1 2 


1 D 


30-39 


All 


7 


12.9 (1.0) 


12 


14 




Female 


4 


12.8 (1.0) 


12 


14 




Male 


3 


13.0 (1.0) 


12 


14 


40-49 


All 


7 


13.4 (1.6) 


12 


16 




Female 


4 


14.3 (1.7) 


12 


16 




Male 


3 


12.3 (0.6) 


12 


13 


50-59 


All 


7 


13.9 (2.0) 


12 


18 




Female 


4 


14.3 (2.6) 


12 


18 




Male 


3 


13.3 (1.2) 


12 


14 



(four women and three men). Age groups were matched for years 
of education, differing by no more than 2 years; no significant 
difference in years of education was detected among age groups 
(p = 0.76). Two Gaussian densities were required for the approach 
based on the AIC. 

In Figure 1, estimated density functions across the entire con- 
trol group (n = 28) and each of the four age groups are presented 
(/ (z) ,fg* (z) ,g = 1, 2, 3, 4). Inference on the difference in shapes 
of FA distribution across age groups was performed by estimating 
mixing probabilities from the proposed estimation method; these 
results are shown in Table 3 and Figure 2. We order estimated 
Gaussian densities {(^i) by their centers, in ascending manner; 
thus, we identify <\>i as the Gaussian density with the lowest center. 

Mixing probabilities of the two Gaussian densities, k=l and 
k = 2, show opposite patterns of change across age groups. As 
age increases the mixing probability of the first Gaussian den- 
sity (A:= 1) increases while that of the second Gaussian density 
{k = 2) decreases. Box plots describing distributions of subject- 
level estimated mixing probabilities to the second Gaussian density 
[xiik = 2), ! = !,..., n] are provided for all age groups in Figure 2. 
While mixing probabilities to the second Gaussian density (k = 2) 
were lower for older age group in that lower mean FA for older 
age group, higher between-subject variance is noted in Figure 2. A 
Kruskal-Wallis test was performed to compare subject-level mix- 
ing probabilities between Group = 1 and Group = g(^ = 2,3,4).A 
significant difference in the mixing probability was found between 
Groups 1 (20-29 years) and 4 (50-59 years) with p = 0.048 with 
degrees of freedom (1, 12). This significantly lower mean mixing 
probability to the Gaussian density with a greater mean implies 
that FA declines significantly over the age of 50 years. This pattern 
also implies lower intra-subject variance in FA distribution for the 
age group (50-59 years) because of higher mixing probability to 
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FIGURE 1 I Estimated age group-wise density functions. Estimated density function for all and each age group is demonstrated; (A) from all subjects, 
(B) from subjects aged 20-29, (C) from subjects aged 30-39, (D) from subjects aged 40-49, and (E) from subjects aged 50-59. 



the first Gaussian density {k= 1). While all age groups showed 
positively skewed FA distributions (Table 3), the youngest group, 
aged 20-29 years, showed the greatest skewness to the right. How- 
ever, the Kruskal-Wallis test, which compares all four groups was 
not significant {p = 0.141, df = 3, 24). 

DISCUSSION 

The proposed approach for subgroup density estimation with 
GMM allows group comparison based on shape parameters rep- 
resented by the relative mixing probabilities of the latent Gaussian 
densities. Since Gaussian densities are estimated based on the 
measurements from all voxels of all subjects, estimated individual 
densities will differ only by their own mixing probabilities. Indi- 
vidual densities are thus characterized by comparing the estimated 
mixing probabilities. Although kernel density estimation (KDE) 



Table 3 | Estimated mixing probabilities for each age group. 



Age 


Tg. (k = 1) (s.e.) 


Tg. (fe = 2) (s.e.) 


All 


0.7488 (0.0076) 


0.2512 (0.0076) 


20-29 


0.7277 (0.0086) 


0.2723 (0.0086) 


30-39 


0.7337 (0.0075) 


0.2663 (0.0075) 


40-49 


0.7481 (0.0150) 


0.2519 (0.0150) 


50-59 


0.7873 (0.0184)=' 


0.2127 (0.0184)= 



'Two-sided p-value < 0.05. Each Gaussian density is specified as n , = ~0.26, 
\i2 = 0.78, and a = 0.87. 



(17) is a widely applied statistical approach for density estima- 
tion, a density function estimated by KDE does not provide shape 
parameters for comparison between groups because the parameter 
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Group 



FIGURE 2 I Box plots of subject-wise mixing probabilities by each age 
group (G= 1, 2, 3, 4). Subject-level estimated mixing probability to the 
second Gaussian density [t, (fc = 2), / = 1 n] is demonstrated for each of 



the age groups (6 = 1 , 2, 3, 4). Outliers in each box plot, marked with red + 
signs, are defined as values that are more than 1.5 times the interquartile 
range away from the top or bottom of the box. 



determining the shape of a density function by KDE is only the 
bandwidth for the chosen kernel function. In contrast, mixing 
probabilities estimated in the present study enable comparison of 
shapes between groups, which we have demonstrated with a real 
FA data set. 

The GMM approach proposed herein was designed to discrim- 
inate density functions across different subgroups. This approach 
implicitly assumes that the density function derived from all vox- 
els will represent a mixture of at least two Gaussian components. 
Testing for differences in the shapes of the density functions 
from different groups is then possible by testing the means of 
mixing rates assigned to m-Gaussian densities (where m > 2) 
between groups. However, if one Gaussian density function (i.e., 
m = 1 ) is sufficient to fit the entire distribution, discrimination 
of density functions between different subgroups is not available 
with the proposed approach. Additionally, the proposed GMM 
is a highly parsimonious approach in that it does not incor- 
porate any subject- or group-specific shape parameters in the 



likelihood function; further improvement may be achieved by their 
inclusion. 

A GMM with unequal variance assumption resulted in little 
difference in fitting performance compared with the proposed 
approach with equal variance assumption (data not shown). 
Nevertheless, application of non-Gaussian mixtures could be 
employed, study of which is beyond the scope of the present paper 
and should serve as a future study. 

Initial application of the method to human FA datasets revealed 
that the distribution of FA differs significantly between subjects 
aged 20-29 years and those aged 50-59 years. Age-related change 
in FA distribution was found in mean, variance (intra-subject, 
between-subject), and skew. Since aging is a feature of neurodegen- 
eration, this finding may have an implication to other neurodegen- 
erative diseases, e.g., dementia and Alzheimer's disease. Since the 
present demonstration is based on a small sample, further exami- 
nations with larger samples are warranted to fully characterize the 
utility of this approach and the age effects it has revealed. 
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